--- title: 01b02 RNA Velocity Data keywords: fastai sidebar: home_sidebar nb_path: "nbs/01 Datasets/01b02 Single-Cell Data and RNA Velocity.ipynb" ---
RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. Authors of RNA Velocity expect it to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.
In this notebook, we have single-cell datasets with RNA velocity information. Functions return 2 $n \times d$ torch tensors and a list. The first tensor contains information of $n$ cells with $d$ features and the second tensor holds the vectors associated with the aforementioned $n$ cells in the same $d$ directions. The list contains the ground truth cell type labels of each of the $n$ cells.
Call one of the following and input returned object into rnavelo or rnavelo_pcs to get data, flows, labels (and n_pcs) in processed form.
scvelo.datasets.pancreas()scvelo.datasets.bonemarrow()scvelo.datasets.dentategyrus()scvelo.datasets.dentategyrus_lamanno()scvelo.datasets.gastrulation_e75()scvelo.datasets.gastrulation_erythroid()scvelo.datasets.forebrain()scvelo.datasets.gastrulation()scvelo.datasets.pbmc68k()scvelo.datasets.simulation(n_obs=300, n_vars=None, alpha=None, beta=None, gamma=None, alpha_=None, t_max=None, noise_model='normal', noise_level=1, switches=None, random_seed=0)adata = scv.datasets.bonemarrow()
rnavelo_plot_pca(adata)
adata = scv.datasets.pancreas()
X, flows, labels = rnavelo(adata)
X2, flows2, labels2, n_pcs = rnavelo_pcs(adata)
print(X.shape)
print(flows.shape)
print(labels.shape)
print(X2.shape)
print(flows2.shape)
print(labels2.shape)
print(n_pcs)
rnavelo_plot_pca(adata)
adata = scv.datasets.dentategyrus()
X, flows, labels = rnavelo(adata)
scv.tl.velocity_graph(adata)
X, flows, labels, n_pcs = rnavelo_pcs(adata)
rnavelo_plot_pca(adata)
The following cell gets the dataset, preprocesses it (including computing velocities), computes a transition probability matrix (the velocity graph), and then uses this to compute a low-dimensional embedding (using UMAP).
adata = scv.datasets.pancreas()
rnavelo_preprocess(adata)
# calculate velocity pca and display pca plot (2 dimensions)
scv.tl.velocity_graph(adata)
scv.tl.velocity_embedding(adata, basis='umap')
Here's the annotated UMAP embedding. The points are embedded using UMAP, and the arrows are fit onto this with a simple heuristic.
scv.pl.velocity_embedding_stream(adata, basis='umap')
We want to extract the points and use them with our plotting functions, for consistency. Let's see the available keys:
adata.obsm
X = torch.tensor(adata.obsm["X_umap"].copy())
flows = torch.tensor(adata.obsm["velocity_umap"].copy())
labels = rnavelo_add_labels(adata)
from FRED.datasets import plot_directed_2d
plot_directed_2d(X,flows,labels, minimal=True)
Repeating this process with the bone marrow dataset... For reasons I have yet to diagnose, scvelo automatically computes the UMAP embedding of the pancreas, but opts to compute the tSNE embedding of the bone marrow.
adata = scv.datasets.bonemarrow()
rnavelo_preprocess(adata)
# calculate velocity and display plot (2 dimensions)
scv.tl.velocity_graph(adata)
We can see which embedding it computed by running:
adata.obsm
scv.tl.velocity_embedding(adata, basis='tsne')
scv.pl.velocity_embedding_stream(adata, basis='tsne')
We want to extract the points and use them with our plotting functions, for consistency. Let's see the available keys:
X = torch.tensor(adata.obsm["X_tsne"].copy())
flows = torch.tensor(adata.obsm["velocity_tsne"].copy())
labels = rnavelo_add_labels(adata)
from FRED.datasets import plot_directed_2d
plot_directed_2d(X,flows,labels, minimal=True)
adata = scv.datasets.dentategyrus()
rnavelo_preprocess(adata)
# calculate velocity and display plot (2 dimensions)
scv.tl.velocity_graph(adata)
We can see which embedding it computed by running:
adata.obsm
scv.tl.velocity_embedding(adata, basis='umap')
scv.pl.velocity_embedding_stream(adata, basis='umap')
We want to extract the points and use them with our plotting functions, for consistency. Let's see the available keys:
X = torch.tensor(adata.obsm["X_umap"].copy())
flows = torch.tensor(adata.obsm["velocity_umap"].copy())
labels = rnavelo_add_labels(adata)
from FRED.datasets import plot_directed_2d
plot_directed_2d(X,flows,labels, minimal=True)